280 research outputs found

    Sample Mixed-Based Data Augmentation for Domestic Audio Tagging

    Get PDF
    Audio tagging has attracted increasing attention since last decade and has various potential applications in many fields. The objective of audio tagging is to predict the labels of an audio clip. Recently deep learning methods have been applied to audio tagging and have achieved state-of-the-art performance, which provides a poor generalization ability on new data. However due to the limited size of audio tagging data such as DCASE data, the trained models tend to result in overfitting of the network. Previous data augmentation methods such as pitch shifting, time stretching and adding background noise do not show much improvement in audio tagging. In this paper, we explore the sample mixed data augmentation for the domestic audio tagging task, including mixup, SamplePairing and extrapolation. We apply a convolutional recurrent neural network (CRNN) with attention module with log-scaled mel spectrum as a baseline system. In our experiments, we achieve an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.Comment: submitted to the workshop of Detection and Classification of Acoustic Scenes and Events 2018 (DCASE 2018), 19-20 November 2018, Surrey, U

    A Unified Framework for Multi-intent Spoken Language Understanding with prompting

    Full text link
    Multi-intent Spoken Language Understanding has great potential for widespread implementation. Jointly modeling Intent Detection and Slot Filling in it provides a channel to exploit the correlation between intents and slots. However, current approaches are apt to formulate these two sub-tasks differently, which leads to two issues: 1) It hinders models from effective extraction of shared features. 2) Pretty complicated structures are involved to enhance expression ability while causing damage to the interpretability of frameworks. In this work, we describe a Prompt-based Spoken Language Understanding (PromptSLU) framework, to intuitively unify two sub-tasks into the same form by offering a common pre-trained Seq2Seq model. In detail, ID and SF are completed by concisely filling the utterance into task-specific prompt templates as input, and sharing output formats of key-value pairs sequence. Furthermore, variable intents are predicted first, then naturally embedded into prompts to guide slot-value pairs inference from a semantic perspective. Finally, we are inspired by prevalent multi-task learning to introduce an auxiliary sub-task, which helps to learn relationships among provided labels. Experiment results show that our framework outperforms several state-of-the-art baselines on two public datasets.Comment: Work in progres

    Pathological Evidence Exploration in Deep Retinal Image Diagnosis

    Full text link
    Though deep learning has shown successful performance in classifying the label and severity stage of certain disease, most of them give few evidence on how to make prediction. Here, we propose to exploit the interpretability of deep learning application in medical diagnosis. Inspired by Koch's Postulates, a well-known strategy in medical research to identify the property of pathogen, we define a pathological descriptor that can be extracted from the activated neurons of a diabetic retinopathy detector. To visualize the symptom and feature encoded in this descriptor, we propose a GAN based method to synthesize pathological retinal image given the descriptor and a binary vessel segmentation. Besides, with this descriptor, we can arbitrarily manipulate the position and quantity of lesions. As verified by a panel of 5 licensed ophthalmologists, our synthesized images carry the symptoms that are directly related to diabetic retinopathy diagnosis. The panel survey also shows that our generated images is both qualitatively and quantitatively superior to existing methods.Comment: to appear in AAAI (2019). The first two authors contributed equally to the paper. Corresponding Author: Feng L

    Surface electromyographic (sEMG) activity of the suprahyoid and sternocleidomastoid muscles in pitch and loudness control

    Get PDF
    Purpose: This study set out to determine the contributions of the suprahyoid and sternocleidomastoid (SCM) muscles in changing pitch and loudness during phonation among vocally healthy populations.Method: Thirty-nine participants were first recruited, and twenty-nine of them who passed the screening test (Voice Handicap Index [VHI]-10 score ≤11, auditory-perceptual voice rating score ≤2) were finally selected (mean age = 28.2 years). All participants were measured for their surface electromyographic (sEMG) activity collected from the bilateral suprahyoid and SCM muscles when producing the vowel /a/, /i/, and /u/ in natural (baseline) and at different pitch (+3, +6, -3, -6 semitones) and loudness (+5, +10, −5 dB) levels. Linear mixed-effects models were performed to determine the influencing factors on the root-mean-square percentage of maximal voluntary contraction (RMS %MVC) value of the sEMG signals.Results: Compared with the baseline, a significant decrease of RMS %MVC was found in the suprahyoid muscles during overall phonations of lower pitches (−3 and −6 semitones) and loudness (−5 dB). However, no significant change was detected when producing speech at higher pitch (+3 and +6 semitones) and loudness (+5 and +10 dB) levels. Among the three vowels, /i/ demonstrated significantly higher RMS %MVC than those of /a/ and /u/. The SCM muscles, however, did not show any significant change in the RMS %MVC values among different vowels in relation to the pitch and loudness changes. When the muscles were compared across the two sides, significantly higher RMS %MVC was found in the right side of the suprahyoid (in pitch and loudness control) and SCM (in pitch control) when compared to the left side.Conclusions: The suprahyoid muscle activities were significantly decreased when producing lower pitches and intensities compared to the natural baselines. The production of sustained /i/ required significantly more suprahyoid muscle activities than those of /a/ and /u/. The SCM muscles did not show much sEMG activity in any of the pitch and loudness levels, which could be used potentially as the calibration or normalization of peri-laryngeal sEMG measurement. The findings also showed a tendency for bilateral asymmetry in the use of suprahyoid and SCM muscles

    Arsenic bioleaching in medical realgar ore and arsenicbearing refractory gold ore by combination of Acidithiobacillus ferrooxidans and Acidithiobacillus thiooxidans

    Get PDF
    Purpose: To develop a novel biotechnological method for removing toxic arsenic from two kinds of representative arsenic-containing ores using different mixed mesophilic acidophiles.Methods: Bioleaching of the two types of arsenic-containing ores by mixed arsenic-unadapted Acidithiobacillus ferrooxidans and Acidithiobacillus thiooxidans or mixed arsenic-adapted cultures, were carried out. Arsenic bioleaching ratios in the various leachates were determined and compared.Results: The results showed that the maximum arsenic leaching ratio obtained from realgar in the presence of mixed adapted cultures was 28.6 %. However, the maximum arsenic leaching ratio from realgar in the presence of mixed unadapted strains was only 12.4 %. Besides, maximum arsenic leaching ratios from arsenic-bearing refractory gold ore by mixed adapted strains or unadapted strains were 45.0 and 22.9 %, respectively. Oxidation of these two ores by sulfuric acid was insignificant, as maximum arsenic leaching ratios of realgar and arsenic-bearing refractory gold ore in the absence of any bacterium were only 2.8 and 11.2 %, respectively.Conclusion: Arsenic leaching ratio of realgar and refractory gold ore can be enhanced significantly in the presence of arsenic-adapted mesophilic acidophiles.Keywords: Adaptation, Acidithiobacillus ferrooxidans, Acidithiobacillus thiooxidans, Realgar, Arsenicbearing refractory gold ore, Arsenic leaching rati

    Protective effect of midazolam against convulsion in neonatal rats via down-regulation of LC3 and Beclin-1 expression

    Get PDF
    Purpose: To investigate the effect of midazolam on growth of neurocytes in vitro and in neonatal rats. Methods: Neurocyte proliferation and activity of lactate dehydrogenase were assessed by MTT and lactate dehydrogenase assays, respectively. Western blotting was used to determine the effect of midazolam on LC3, Bax, p62 and Beclin-1 protein expressions. Results: The suppression of neurocyte proliferation byconvulsion was alleviated significantly (p < 0.05) by midazolum treatment. Exposure of convulsion model of neurocytes to midazolum suppressed LC3, Bax, p62 and Beclin-1 protein expression. Midazolum exposure of convulsion model of neurocytes suppressed LDH, caspase-3, caspase-8 and caspase-9 activities. The 3-MA (autophagy inhibitor) treatment also significantly (p < 0.05) promoted neurocyte viability after convulsion induction. In convulsion-induced neurocytes, 3-MA exposure suppressed expression of caspase-3/8/9, LC3, Bax, Beclin-1 and p62, while application of midazolum treatment to the rats with convulsion markedly decreased brain water content and neurocyte apoptosis (p < 0.05). Treatment with midazolum inhibited LC3, p62 and Beclin-1 expression in the rat model of convulsion. Conclusion: Midazolum promotes neurocyte proliferation and inhibits edema development via downregulation of autophagy. Therefore, midazolum can potentially be used for the treatment of convulsion, but further studies need to be carried out first. Keywords: Convulsion, Neurocytes, Caspase, Autophagy, Mitochondrial pathwa
    • …
    corecore